Variant Discovery    ◾    125

bcftools filter -O z \

-o filtered2_sarscov2.vcf.gz \

-i ‘DP>300’ filtered_sarscov2.vcf.gz

You can open the filtered VCF file to notice the changes and that the filter command will

be added to the VCF header.

Usually, you can implement different filters on the variants in a VCF file to achieve accu-

rate and reliable results.

Another way to filter variants is to use “bcftools isec” with truth variants in a VCF file

as input together with your raw VCF file to create intersections, unions, and complements

of the VCF files.

bcftools isec -c both -p isec truth.vcf.gz input.vcf.gz

Refer to bcftools help for more details.

4.2.2  Haplotype-Based Variant Callers

The haplotype-based variant calling programs usually use Bayesian probabilistic model

to predict variants on aligned reads based on a haplotype structure of variants rather

than only sequence alignment. The haplotype is a set of genetic variants that are inherited

together. The haplotype-based variant detection depends on the physical phasing, which

is the process of inferring haplotype structure based on genotypic data using the Bayesian

approach. The prediction is based on relating the probability of a specific genotype given a

set of reads to the likelihood of sequencing errors in the reads and the prior likelihood of

specific genotypes. The Bayesian haplotype-based approach allows modeling multiallelic

loci that enables direct detection of a longer, multi-base alleles from sequence alignment.

Using reads aligned to a reference sequence (BAM) as input, haplotype-based algorithm

first attempts to identify the active regions of variations on the reads aligned to the refer-

ence genome. The identification of the active regions is carried out with dynamic sliding

windows of certain size along the reference sequence. The number of events (mismatches,

InDels, and soft clips) is counted in each window. When the number of events in that

FIGURE 4.5  VCF file containing filtered variants.